Exploring EVENTS

Screen%20Shot%202022-01-26%20at%2012.38.44.png

Experiments

    1. Visualising Events Dataframe
    1. Exploring Tags Events
    1. Calculating Events Description Similarity
    1. Calculating Events Description Topic Modelling
    1. Exploring the Schedules of Events
      • 5.1 Getting the Frequency of Starting Dates of Events Schedules
      • 5.2 Getting the Frequency of End Dates of Events Schedules
    1. Exploring the Performances Tickets of Events Schedules
      • 6.1 Getting the Frequency of Price Tickets
      • 6.2 Getting the frequency of type (Standard, Children) tickets
      • 6.3 Exploring Performances Places - ATENTION: Merging information with "places" dataframe!
        • 6.3.1 Frequency of Performances per town
        • 6.3.2 Frequency of Type tickets per town
        • 6.3.3 Frequency of Price tickets type per town
        • 6.3.4 Frequency of Max_Price tickets per town
          • 6.3.4.1 Frequency of Free tickets per town
          • 6.3.4.2 Frequency of No Free tickets per town
      • 6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews
        • 6.4.1 Frequency of Price Tickets per Scottish City
        • 6.4.2 Frequency of Type Tickets per Scottish City
        • 6.4.3 Frequency of Schedules Dates per Event and per Scottish City
        • 6.4.4.Grouping Schedules per Event and Scottish City
        • 6.4.5 Exploring Tags per Schedule and Scottish Cities
          • 6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh
          • 6.4.5.2 Exploring the Frequency of schedules tags for Glasgow
        • 6.4.6 Histograms of starting/end schedules dates for Edinburgh
        • 6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time
          • 6.4.7.1 Frequency of schedules Starting Date in Scottish City
          • 6.4.7.2 Frequency of schedules Ending Date in Scottish City
          • 6.4.7.3 Scheduled tags and Starting Dates in Scottish City
          • 6.4.7.4 Scheduled tags and Starting Dates in Scottish City

0. Importing libraries and loading the json file with 5000 events to a dataframe

In [1]:
import json
import pandas as pd
import plotly.express as px
import os
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pickle
import plotly.graph_objects as go
import numpy as np
In [2]:
with open('dataset/sample_20210501.json', 'r') as f:
    data = json.load(f)
    print(len(data["events"]))
    events=data["events"]
df = pd.DataFrame(events)
2964

1. Visualizing the events dataframe

In [3]:
df
Out[3]:
event_id modified_ts created_ts name sort_name status id schedules descriptions website tags category properties ranking_level ranking_in_level phone_numbers alternative_names
0 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'start_ts': '2021-06-05T19:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://thepublandlord.com/ [Comedy, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN
1 882166 2021-12-11T05:45:26Z 2017-11-07T16:43:35Z Catherine Bohart: Work in Progress Catherine Bohart: Work in Progress live 882166 [{'start_ts': '2021-08-04T18:45:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Comedy, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 2 NaN NaN
2 948902 2022-01-19T01:40:20Z 2018-03-03T05:58:00Z Hal Cruttenden: It's Best You Hear It From Me Hal Cruttenden: It's Best You Hear It From Me live 948902 [{'start_ts': '2021-08-11T18:45:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.halcruttenden.com [Comedy, Stand-Up, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN
3 1200636 2022-01-18T01:39:28Z 2019-01-31T17:50:02Z Fern Brady: Autistic Bikini Queen Fern Brady: Autistic Bikini Queen live 1200636 [{'start_ts': '2021-08-06T18:45:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://fernbrady.co.uk/ [Comedy, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 2 NaN NaN
4 1370307 2021-09-18T01:04:09Z 2019-08-05T10:26:52Z Rob Auton: The Time Show Rob Auton: The Time Show live 1370307 [{'start_ts': '2021-09-18T17:00:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN [Comedy, Rob Auton, Stand-up] Comedy {'dropin_event': False, 'booking_essential': F... 2 2 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2959 1712071 2021-09-02T17:43:01Z 2021-09-02T15:23:23Z Tayport Climate Festival Tayport Climate Festival live 1712071 [{'start_ts': '2021-09-24T00:00:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Days out, Nature, Scottish Festivals] Days out {'dropin_event': False, 'booking_essential': F... 3 1 NaN NaN
2960 1672067 2021-09-21T14:23:39Z 2021-09-21T14:23:39Z Chopin & Champagne By Candlelight | Four Ball... Chopin & Champagne By Candlelight | Four Ball... live 1672067 [{'start_ts': '2021-09-25T18:00:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Music] Music {'list:importance': 'l', 'dropin_event': False... 3 2 NaN NaN
2961 1724456 2021-09-29T09:31:55Z 2021-09-28T10:38:29Z Edinburgh Multicultural Festival 2021 Edinburgh Multicultural Festival 2021 live 1724456 [{'start_ts': '2021-10-03T13:00:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Dance, Festival, Music, Stand-up, Visual art] Dance {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN
2962 1726176 2021-10-01T17:43:02Z 2021-10-01T11:10:33Z Blunt Knife Co. presents an exhibition by This... Blunt Knife Co. presents an exhibition by This... live 1726176 [{'start_ts': '2021-10-01T18:00:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Exhibition, Visual art] Visual art {'dropin_event': False, 'expected_visit_durati... 3 2 NaN NaN
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'start_ts': '2021-10-08T09:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN [Contemporary, Mixed Media, Painting & Drawing... Visual art {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN

2964 rows × 17 columns

In [4]:
## selecting some columns

Experiment 2: Exploring Tags Events

We are going to separete the elements stored in each tag list into new rows.

In [5]:
df["tags"][0:5]
Out[5]:
0               [Comedy, Stand-up]
1               [Comedy, Stand-up]
2     [Comedy, Stand-Up, Stand-up]
3               [Comedy, Stand-up]
4    [Comedy, Rob Auton, Stand-up]
Name: tags, dtype: object
In [6]:
df_tags=df.explode('tags')
In [7]:
df_tags
Out[7]:
event_id modified_ts created_ts name sort_name status id schedules descriptions website tags category properties ranking_level ranking_in_level phone_numbers alternative_names
0 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'start_ts': '2021-06-05T19:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://thepublandlord.com/ Comedy Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN
0 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'start_ts': '2021-06-05T19:30:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://thepublandlord.com/ Stand-up Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN
1 882166 2021-12-11T05:45:26Z 2017-11-07T16:43:35Z Catherine Bohart: Work in Progress Catherine Bohart: Work in Progress live 882166 [{'start_ts': '2021-08-04T18:45:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN Comedy Comedy {'dropin_event': False, 'booking_essential': F... 2 2 NaN NaN
1 882166 2021-12-11T05:45:26Z 2017-11-07T16:43:35Z Catherine Bohart: Work in Progress Catherine Bohart: Work in Progress live 882166 [{'start_ts': '2021-08-04T18:45:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... NaN Stand-up Comedy {'dropin_event': False, 'booking_essential': F... 2 2 NaN NaN
2 948902 2022-01-19T01:40:20Z 2018-03-03T05:58:00Z Hal Cruttenden: It's Best You Hear It From Me Hal Cruttenden: It's Best You Hear It From Me live 948902 [{'start_ts': '2021-08-11T18:45:00+01:00', 'en... [{'type': 'description.list.default', 'descrip... http://www.halcruttenden.com Comedy Comedy {'dropin_event': False, 'booking_essential': F... 2 1 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'start_ts': '2021-10-08T09:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN Painting & Drawing Visual art {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'start_ts': '2021-10-08T09:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN Photography Visual art {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'start_ts': '2021-10-08T09:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN Prints Visual art {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'start_ts': '2021-10-08T09:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN Sculpture Visual art {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'start_ts': '2021-10-08T09:30:00+01:00', 'en... [{'type': 'description.official', 'description... NaN Visual art Visual art {'dropin_event': False, 'expected_visit_durati... 3 1 NaN NaN

7452 rows × 17 columns

In [8]:
g_tags=df_tags.groupby(['tags']).size().reset_index()
g_tags=g_tags.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
g_tags
Out[8]:
tags number_of_times
302 Music 820
100 Comedy 461
127 Days out 375
187 Film 373
470 Theatre 347
... ... ...
224 Handmade 1
221 Grunge 1
220 Grime 1
219 Golf 1
526 vintage 1

527 rows × 2 columns

In [9]:
fig = px.line(g_tags, x="tags", y="number_of_times", title='Number of times that each tag appears')
fig.show()

Experiment 3: Description Similarity

Exploding the column description

Given a description cell, with a list of descriptions, we will create new row per element in that list.

In [10]:
df["descriptions"][0:5]
Out[10]:
0    [{'type': 'description.list.default', 'descrip...
1    [{'type': 'description.list.default', 'descrip...
2    [{'type': 'description.list.default', 'descrip...
3    [{'type': 'description.list.default', 'descrip...
4    [{'type': 'description.list.default', 'descrip...
Name: descriptions, dtype: object
In [11]:
df_descriptions=df.explode('descriptions')
In [12]:
df_d=pd.concat([df_descriptions.drop(['descriptions'], axis=1), df_descriptions['descriptions'].apply(pd.Series)], axis=1)
In [13]:
df_desc=df_d[["event_id", "description"]]
In [14]:
df_desc
Out[14]:
event_id description
0 401143 A brand new show from everyone's favourite big...
0 401143 Citizens of Hope and Glory! Our new tomorrow b...
1 882166 A Work In Progress show from Funny Woman and B...
1 882166 Catherine Bohart is an award-winning comedian,...
2 948902 Observational humour from the TV regular.
... ... ...
2961 1724456 Edinburgh Multicultural Festival is returning ...
2962 1726176 Works will be sold in-store, and you can meet ...
2962 1726176 Works will be sold in-store, and you can meet ...
2963 1731142 In their inaugural exhibition, following exten...
2963 1731142 In their inaugural exhibition, following exten...

5054 rows × 2 columns

Finding similar descriptions events - Deep Learning - Transformers

In [15]:
# remving the rows which description is empty
df_desc1=df_desc.dropna(subset=['description']).reset_index()
In [16]:
df_desc1[0:5]
Out[16]:
index event_id description
0 0 401143 A brand new show from everyone's favourite big...
1 0 401143 Citizens of Hope and Glory! Our new tomorrow b...
2 1 882166 A Work In Progress show from Funny Woman and B...
3 1 882166 Catherine Bohart is an award-winning comedian,...
4 2 948902 Observational humour from the TV regular.
In [17]:
# total number of rows with descriptions
df_desc1.shape[0]
Out[17]:
5039
In [18]:
#selecting the description colum
documents=df_desc1["description"].values
In [19]:
#documents
In [20]:
#d=documents[0:100]
d=documents[:]
In [21]:
# Using all-MiniLM-L6-v2 Transformer
model = SentenceTransformer('all-MiniLM-L6-v2')
In [22]:
#Training our text_embeddings - using the descriptions available & all-MiniLM-L6-v2 Transformer
text_embeddings = model.encode(d, batch_size = 8, show_progress_bar = True)

In [23]:
np.shape(text_embeddings)
Out[23]:
(5039, 384)
In [24]:
### A small example how to get an embedding vector from a description
In [25]:
first_description=df_desc1["description"].iloc[0]
first_description
first_description_embedding= model.encode(first_description, batch_size = 8, show_progress_bar = True)

Finding the similarity between documents

In [26]:
similarity_def=cosine_similarity(
    [first_description_embedding],
    text_embeddings)
In [27]:
similarities = cosine_similarity(text_embeddings)
print('pairwise dense output:\n {}\n'.format(similarities))
pairwise dense output:
 [[1.         0.3291993  0.43793815 ... 0.04365206 0.12979887 0.12979887]
 [0.3291993  1.0000001  0.21279764 ... 0.0174488  0.18835315 0.18835315]
 [0.43793815 0.21279764 1.0000002  ... 0.15264964 0.1544343  0.1544343 ]
 ...
 [0.04365206 0.0174488  0.15264964 ... 0.9999999  0.37924892 0.37924892]
 [0.12979887 0.18835315 0.1544343  ... 0.37924892 1.         1.        ]
 [0.12979887 0.18835315 0.1544343  ... 0.37924892 1.         1.        ]]

In [28]:
similarities_sorted = similarities.argsort()
similarities_sorted
Out[28]:
array([[2631, 4860, 1486, ..., 4211, 3514,    0],
       [3680, 3681, 4491, ..., 4293, 4292,    1],
       [1414, 2555, 2556, ..., 4831,  330,    2],
       ...,
       [ 802,  801, 3710, ..., 1904, 5035, 5036],
       [ 652, 1561, 2046, ..., 2716, 5037, 5038],
       [ 652, 1561, 2046, ..., 2716, 5037, 5038]])
In [29]:
id_1 = []
id_2 = []
score = []
for index,array in enumerate(similarities_sorted):
    p=len(array)
    id_1.append(index)
    id_2.append(array[-2])
    score.append(similarities[index][array[-2]])
index_df = pd.DataFrame({'id_1' : id_1,
                          'id_2' : id_2,
                          'score' : score})
print(p)
5039
In [30]:
index_df
Out[30]:
id_1 id_2 score
0 0 3514 0.532041
1 1 4292 0.544361
2 2 330 0.613548
3 3 4831 0.630734
4 4 3140 0.624324
... ... ... ...
5034 5034 5033 1.000000
5035 5035 5035 1.000000
5036 5036 5035 1.000000
5037 5037 5037 1.000000
5038 5038 5037 1.000000

5039 rows × 3 columns

Finding the first 10 similar definitions given the document 3

In [31]:
## Lets take the document 3
doc_index =3
documents[3]
Out[31]:
'Catherine Bohart is an award-winning comedian, writer and actor.\n\nCatherine began performing stand-up in 2015 and has enjoyed a rapid rise through the ranks of UK and Irish comedy, becoming a finalist in both the BBC New Comedy Awards and Funny Women the very next year.\n\n2016 also saw Catherine and fellow comic Cally Beaton perform their show Catcall for a full run at the Edinburgh Fringe, enjoying packed houses and stellar reviews. The following August, Catherine joined the line-up of the prestigious Pleasance Comedy Reserve.\n\nCatherine took her debut solo show, Immaculate, to the Edinburgh Fringe in 2018. Immaculate received widespread acclaim, with The Times describing it as ?the sort of perfectly structured Edinburgh debut you always hope for and rarely get to see? in their four star review. Catherine returned to the festival in 2019 with her second hour-long show, Lemon. It enjoyed a month of sold-out performances and excellent reviews before taking the show on UK and Irish tour, including a run of jam-packed performances at London?s Soho Theatre Mainspace.\n\nCatherine has also taken her talents overseas. Spring 2019 saw Catherine selected to join the line-up for New Order, which showcases the best in original and trendsetting UK comedy at Melbourne International Comedy Festival.\n\nCatherine?s  broadcasting career has seen her appear on E4?s 8 Out Of 10 Cats, Comedy Central UK?s Roast Battle, ITV2?s The Stand Up Sketch Show, Dave?s Jon Richardson: Ultimate Worrier and as a regular correspondent on BBC2?s The Mash Report. Catherine has written material for BBC shows The Now Show, The News Quiz, Newsjack and Frankie Boyle?s New World Order and E4?s Savage Socials. She has co-hosted BBC Radio 4 Extra Comedy Club with Arthur Smith, hosted a series of Funny From the Fringe for BBC Radio 4 Extra and was a guest on Reasons To Be Cheerful with Ed Miliband and Geoff Lloyd. In addition, Catherine was named The Times Comedy Face To Watch for 2019 and featured on the BBC New Talent Hotlist for 2017.\n\n2020 has also seen Catherine launch hit new BBC Radio 4 relationships podcast You?ll Do which she hosts with fellow comic Sarah Keyworth.\n\nCatherine is also a trained actor, having studied an MA in Acting for Screen at the Royal Central School of Speech and Drama.'
In [32]:
results={}
for i in range(-2, -12, -1):
    similar_index=similarities_sorted[doc_index][i]
    rank=similarities[doc_index][similar_index]
    results[similar_index]=[rank]
In [33]:
results
Out[33]:
{4831: [0.63073367],
 4830: [0.63001806],
 3058: [0.605795],
 3500: [0.5774411],
 3888: [0.5774411],
 1937: [0.5730407],
 1938: [0.5730407],
 2649: [0.5729476],
 3464: [0.5729476],
 3718: [0.5682135]}

Experiment 4: Description Topic Modelling - Deep Learning - BERTopic

Lets find the topic modelling of our descriptions We are going to use the text_embeddings calculated in the previous phase.

In [34]:
len(documents)
Out[34]:
5039
In [35]:
topic_model = BERTopic(min_topic_size=20).fit(documents, text_embeddings)
In [36]:
topics, probs = topic_model.transform(documents, text_embeddings)

Visualizing our topics

In [37]:
topic_model.visualize_topics()
In [38]:
#### Visualzing the first 5 keywords of our first 5 topics
In [39]:
topic_model.visualize_barchart()

Visualizing the similarity between topics

In [40]:
topic_model.visualize_heatmap()

Getting the frequency of each topic.

We should always ignore the first -1 topic.

In [41]:
#Lets see the frequency of the first 10 topics
topic_model.get_topic_freq()[0:10]
Out[41]:
Topic Count
0 -1 1613
1 0 1068
2 1 366
3 2 318
4 3 178
5 4 151
6 5 126
7 6 125
8 7 119
9 8 94
In [42]:
print("Number of topics found %s" %len(topic_model.get_topic_freq()))
Number of topics found 36

Visualizing the keywords of our topics.

In [43]:
#topic_model.get_topics()
In [44]:
document_3_topic=topics[3]
print("The topic of the document 3 is %s " %document_3_topic)
The topic of the document 3 is 2 
In [45]:
topic_model.get_topic(0)
Out[45]:
[('music', 0.016787011627023216),
 ('band', 0.015265536170597503),
 ('their', 0.013648317962900546),
 ('album', 0.013599094093463151),
 ('from', 0.012503052152635365),
 ('songs', 0.011924753501750394),
 ('jazz', 0.011502297963565035),
 ('with', 0.01119471677775723),
 ('as', 0.010602805008380912),
 ('rock', 0.01040010171154091)]
In [46]:
df_desc1[3:4]
Out[46]:
index event_id description
3 1 882166 Catherine Bohart is an award-winning comedian,...
In [47]:
topic_model.get_topic(document_3_topic)
Out[47]:
[('comedy', 0.04458662941988456),
 ('show', 0.024055643995162437),
 ('fringe', 0.020890393492141845),
 ('category', 0.018851028700980925),
 ('languageswearing', 0.018320766888482026),
 ('funny', 0.017743577229655902),
 ('strong', 0.017641678899529396),
 ('standup', 0.01740257746705355),
 ('age', 0.017325117353350728),
 ('comedian', 0.015162837636451017)]

Experiment 5: Exploring the Schedules of Events

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the schedules column

In [48]:
df["schedules"]
Out[48]:
0       [{'start_ts': '2021-06-05T19:30:00+01:00', 'en...
1       [{'start_ts': '2021-08-04T18:45:00+01:00', 'en...
2       [{'start_ts': '2021-08-11T18:45:00+01:00', 'en...
3       [{'start_ts': '2021-08-06T18:45:00+01:00', 'en...
4       [{'start_ts': '2021-09-18T17:00:00+01:00', 'en...
                              ...                        
2959    [{'start_ts': '2021-09-24T00:00:00+01:00', 'en...
2960    [{'start_ts': '2021-09-25T18:00:00+01:00', 'en...
2961    [{'start_ts': '2021-10-03T13:00:00+01:00', 'en...
2962    [{'start_ts': '2021-10-01T18:00:00+01:00', 'en...
2963    [{'start_ts': '2021-10-08T09:30:00+01:00', 'en...
Name: schedules, Length: 2964, dtype: object
In [49]:
df_schedules=df
df_schedules.rename(columns={'tags':'event_tags'}, inplace=True)
df_schedules.rename(columns={'name':'event_name'}, inplace=True)
df_schedules.rename(columns={'links':'event_links'}, inplace=True)
df_schedules=df.explode('schedules')
#df_schedules
df_s=pd.concat([df_schedules.drop(['schedules'], axis=1), df_schedules['schedules'].apply(pd.Series)], axis=1)
In [50]:
df_s.iloc[0]
Out[50]:
event_id                                                        401143
modified_ts                                       2022-01-09T01:19:40Z
created_ts                                        2014-04-10T03:16:09Z
event_name                       Al Murray: Landlord of Hope and Glory
sort_name                        Al Murray: Landlord of Hope and Glory
status                                                            live
id                                                              401143
descriptions         [{'type': 'description.list.default', 'descrip...
website                                     http://thepublandlord.com/
event_tags                                          [Comedy, Stand-up]
category                                                        Comedy
properties           {'dropin_event': False, 'booking_essential': F...
ranking_level                                                        2
ranking_in_level                                                     1
phone_numbers                                                      NaN
alternative_names                                                  NaN
start_ts                                     2021-06-05T19:30:00+01:00
end_ts                                       2021-09-10T19:00:00+01:00
place_id                                                         22978
performances         [{'ts': '2021-06-05T19:30:00+01:00', 'links': ...
performance_space                                                  NaN
phone_numbers                                                      NaN
Name: 0, dtype: object

Getting the Frequency of Starting Dates of Events Schedules

In [51]:
df_start=df_s.groupby([pd.to_datetime(df_s['start_ts'])]).size().reset_index()
df_start=df_start.rename(columns={0: "number_of_times"})
df_start=df_start.sort_values(by=['number_of_times'], ascending=False)
df_start.reset_index()
Out[51]:
index start_ts number_of_times
0 603 2021-08-06 10:00:00+01:00 37
1 0 2021-05-01 00:00:00+01:00 14
2 2 2021-05-01 10:00:00+01:00 12
3 1986 2021-10-23 19:00:00+01:00 9
4 1968 2021-10-22 19:00:00+01:00 9
... ... ... ...
2103 786 2021-08-11 00:00:00+01:00 1
2104 785 2021-08-10 21:30:00+01:00 1
2105 783 2021-08-10 20:40:00+01:00 1
2106 781 2021-08-10 20:10:00+01:00 1
2107 2107 2021-10-31 23:15:00+00:00 1

2108 rows × 3 columns

Visualizing the previous Start_Ts Schedules Events Freq.

In [52]:
fig = px.histogram(df_start, x='start_ts', y="number_of_times", title="Frequency of Starts Dates Schedules")
fig.show()

Getting the Frequency of End Dates of Events Schedules

In [53]:
df_end=df_s.groupby([pd.to_datetime(df_s['end_ts'])]).size().reset_index()
df_end=df_end.rename(columns={0: "number_of_times"})
df_end=df_end.sort_values(by=['number_of_times'], ascending=False)
df_end.reset_index()
fig = px.histogram(df_end, x='end_ts', y="number_of_times", title="Frequency of End Dates Schedules")
fig.show()

Experiment 6: Exploring the Performances Tickets of Events Schedules

  • 1 Event can have 1 to N Schedules.
  • 1 Schedule is in 1 Place
  • 1 Schedule can have 1 to N Performances
  • 1 Peformance can have 1 to N Tickets
  • 1 Ticket has a max_price, min_price, currency.

Lets starting exploding the performance column. We can not explode the performance column, if we hadnt have exploded the schedules column before. For that reason, we are using df_s dataframe, which has already exploded the schedules column.

In [54]:
df_s
Out[54]:
event_id modified_ts created_ts event_name sort_name status id descriptions website event_tags ... ranking_level ranking_in_level phone_numbers alternative_names start_ts end_ts place_id performances performance_space phone_numbers
0 401143 2022-01-09T01:19:40Z 2014-04-10T03:16:09Z Al Murray: Landlord of Hope and Glory Al Murray: Landlord of Hope and Glory live 401143 [{'type': 'description.list.default', 'descrip... http://thepublandlord.com/ [Comedy, Stand-up] ... 2 1 NaN NaN 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 22978 [{'ts': '2021-06-05T19:30:00+01:00', 'links': ... NaN NaN
1 882166 2021-12-11T05:45:26Z 2017-11-07T16:43:35Z Catherine Bohart: Work in Progress Catherine Bohart: Work in Progress live 882166 [{'type': 'description.list.default', 'descrip... NaN [Comedy, Stand-up] ... 2 2 NaN NaN 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 84275 [{'ts': '2021-08-04T18:45:00+01:00', 'links': ... NaN NaN
2 948902 2022-01-19T01:40:20Z 2018-03-03T05:58:00Z Hal Cruttenden: It's Best You Hear It From Me Hal Cruttenden: It's Best You Hear It From Me live 948902 [{'type': 'description.list.default', 'descrip... http://www.halcruttenden.com [Comedy, Stand-Up, Stand-up] ... 2 1 NaN NaN 2021-08-11T18:45:00+01:00 2021-08-15T18:45:00+01:00 15617 [{'ts': '2021-08-11T18:45:00+01:00', 'properti... NaN NaN
3 1200636 2022-01-18T01:39:28Z 2019-01-31T17:50:02Z Fern Brady: Autistic Bikini Queen Fern Brady: Autistic Bikini Queen live 1200636 [{'type': 'description.list.default', 'descrip... http://fernbrady.co.uk/ [Comedy, Stand-up] ... 2 2 NaN NaN 2021-08-06T18:45:00+01:00 2021-08-17T18:45:00+01:00 84275 [{'ts': '2021-08-06T18:45:00+01:00', 'links': ... NaN NaN
4 1370307 2021-09-18T01:04:09Z 2019-08-05T10:26:52Z Rob Auton: The Time Show Rob Auton: The Time Show live 1370307 [{'type': 'description.list.default', 'descrip... NaN [Comedy, Rob Auton, Stand-up] ... 2 2 NaN NaN 2021-09-18T17:00:00+01:00 2021-09-18T17:00:00+01:00 1 [{'ts': '2021-09-18T17:00:00+01:00', 'duration... NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2959 1712071 2021-09-02T17:43:01Z 2021-09-02T15:23:23Z Tayport Climate Festival Tayport Climate Festival live 1712071 [{'type': 'description.official', 'description... NaN [Days out, Nature, Scottish Festivals] ... 3 1 NaN NaN 2021-09-24T00:00:00+01:00 2021-09-26T00:00:00+01:00 130706 [{'ts': '2021-09-24', 'time_unknown': 'Times v... NaN NaN
2960 1672067 2021-09-21T14:23:39Z 2021-09-21T14:23:39Z Chopin & Champagne By Candlelight | Four Ball... Chopin & Champagne By Candlelight | Four Ball... live 1672067 [{'type': 'description.official', 'description... NaN [Music] ... 3 2 NaN NaN 2021-09-25T18:00:00+01:00 2021-09-25T20:00:00+01:00 130868 [{'ts': '2021-09-25T18:00:00+01:00', 'links': ... NaN NaN
2961 1724456 2021-09-29T09:31:55Z 2021-09-28T10:38:29Z Edinburgh Multicultural Festival 2021 Edinburgh Multicultural Festival 2021 live 1724456 [{'type': 'description.official', 'description... NaN [Dance, Festival, Music, Stand-up, Visual art] ... 3 1 NaN NaN 2021-10-03T13:00:00+01:00 2021-10-03T13:00:00+01:00 130944 [{'ts': '2021-10-03T13:00:00+01:00', 'duration... NaN NaN
2962 1726176 2021-10-01T17:43:02Z 2021-10-01T11:10:33Z Blunt Knife Co. presents an exhibition by This... Blunt Knife Co. presents an exhibition by This... live 1726176 [{'type': 'description.official', 'description... NaN [Exhibition, Visual art] ... 3 2 NaN NaN 2021-10-01T18:00:00+01:00 2021-10-08T10:00:00+01:00 130974 [{'ts': '2021-10-01T18:00:00+01:00', 'duration... NaN NaN
2963 1731142 2021-10-12T12:16:00Z 2021-10-11T15:12:08Z People, Places, Perspectives Exhibition People, Places, Perspectives Exhibition live 1731142 [{'type': 'description.official', 'description... NaN [Contemporary, Mixed Media, Painting & Drawing... ... 3 1 NaN NaN 2021-10-08T09:30:00+01:00 2021-10-31T09:30:00+00:00 131091 [{'ts': '2021-10-08T09:30:00+01:00', 'duration... NaN NaN

3373 rows × 22 columns

In [55]:
a=df_s[["event_id", "event_name", "performances", "event_tags", "start_ts", "end_ts", "place_id"]]
df_p=a.explode("performances")
In [56]:
df_p
Out[56]:
event_id event_name performances event_tags start_ts end_ts place_id
0 401143 Al Murray: Landlord of Hope and Glory {'ts': '2021-06-05T19:30:00+01:00', 'links': [... [Comedy, Stand-up] 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 22978
0 401143 Al Murray: Landlord of Hope and Glory {'ts': '2021-09-10T19:00:00+01:00', 'links': [... [Comedy, Stand-up] 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 22978
1 882166 Catherine Bohart: Work in Progress {'ts': '2021-08-04T18:45:00+01:00', 'links': [... [Comedy, Stand-up] 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 84275
1 882166 Catherine Bohart: Work in Progress {'ts': '2021-08-05T18:45:00+01:00', 'links': [... [Comedy, Stand-up] 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 84275
1 882166 Catherine Bohart: Work in Progress {'ts': '2021-08-06T17:30:00+01:00', 'links': [... [Comedy, Stand-up] 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 84275
... ... ... ... ... ... ... ...
2963 1731142 People, Places, Perspectives Exhibition {'ts': '2021-10-27T09:30:00+01:00', 'duration'... [Contemporary, Mixed Media, Painting & Drawing... 2021-10-08T09:30:00+01:00 2021-10-31T09:30:00+00:00 131091
2963 1731142 People, Places, Perspectives Exhibition {'ts': '2021-10-28T09:30:00+01:00', 'duration'... [Contemporary, Mixed Media, Painting & Drawing... 2021-10-08T09:30:00+01:00 2021-10-31T09:30:00+00:00 131091
2963 1731142 People, Places, Perspectives Exhibition {'ts': '2021-10-29T09:30:00+01:00', 'duration'... [Contemporary, Mixed Media, Painting & Drawing... 2021-10-08T09:30:00+01:00 2021-10-31T09:30:00+00:00 131091
2963 1731142 People, Places, Perspectives Exhibition {'ts': '2021-10-30T09:30:00+01:00', 'duration'... [Contemporary, Mixed Media, Painting & Drawing... 2021-10-08T09:30:00+01:00 2021-10-31T09:30:00+00:00 131091
2963 1731142 People, Places, Perspectives Exhibition {'ts': '2021-10-31T09:30:00+00:00', 'duration'... [Contemporary, Mixed Media, Painting & Drawing... 2021-10-08T09:30:00+01:00 2021-10-31T09:30:00+00:00 131091

25537 rows × 7 columns

In [57]:
df_p=pd.concat([df_p.drop(['performances'], axis=1), df_p['performances'].apply(pd.Series)], axis=1)
In [58]:
df_p[0:2]
Out[58]:
event_id event_name event_tags start_ts end_ts place_id ts links tickets properties duration descriptions time_unknown
0 401143 Al Murray: Landlord of Hope and Glory [Comedy, Stand-up] 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 22978 2021-06-05T19:30:00+01:00 [{'type': 'booking', 'url': 'https://www.seeti... [{'type': 'Standard', 'currency': 'GBP', 'min_... NaN NaN NaN NaN
0 401143 Al Murray: Landlord of Hope and Glory [Comedy, Stand-up] 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 22978 2021-09-10T19:00:00+01:00 [{'type': 'booking', 'url': 'https://www.seeti... [{'type': 'Standard', 'currency': 'GBP', 'min_... NaN NaN NaN NaN

Exploring tickets

Now we have to explode the tickets column. We are going to remove the rows which tickets information is empty.

In [59]:
df_p=df_p.dropna(subset=['tickets'])

Since we dont need all the columns, we have selects a few of them.

In [60]:
df_t=df_p[["event_id", "event_name", "descriptions", "event_tags", "tickets", "place_id", "start_ts", "end_ts"]]
In [61]:
df_t[0:5]
Out[61]:
event_id event_name descriptions event_tags tickets place_id start_ts end_ts
0 401143 Al Murray: Landlord of Hope and Glory NaN [Comedy, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 22978 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00
0 401143 Al Murray: Landlord of Hope and Glory NaN [Comedy, Stand-up] [{'type': 'Standard', 'currency': 'GBP', 'min_... 22978 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00
1 882166 Catherine Bohart: Work in Progress NaN [Comedy, Stand-up] [{'type': 'Standard', 'currency': 'GBP'}] 84275 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
1 882166 Catherine Bohart: Work in Progress NaN [Comedy, Stand-up] [{'type': 'Standard', 'currency': 'GBP'}] 84275 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
1 882166 Catherine Bohart: Work in Progress NaN [Comedy, Stand-up] [{'type': 'Standard', 'currency': 'GBP'}] 84275 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
In [62]:
df_t1=df_t.explode("tickets")

Now we are going to transform the max, and min prices of tickets to numeric values.

In [63]:
df_tickets=pd.concat([df_t1.drop(['tickets'], axis=1), df_t1['tickets'].apply(pd.Series)], axis=1)
df_tickets['min_price'] = pd.to_numeric(df_tickets['min_price'])
df_tickets['max_price'] = pd.to_numeric(df_tickets['max_price'])
df_tickets['min_price']= df_tickets['min_price'].fillna(0)
df_tickets['max_price']= df_tickets['max_price'].fillna(0)
In [64]:
df_tickets[0:5]
Out[64]:
event_id event_name descriptions event_tags place_id start_ts end_ts 0 currency description max_price min_price type
0 401143 Al Murray: Landlord of Hope and Glory NaN [Comedy, Stand-up] 22978 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 NaN GBP NaN 0.0 30.8 Standard
0 401143 Al Murray: Landlord of Hope and Glory NaN [Comedy, Stand-up] 22978 2021-06-05T19:30:00+01:00 2021-09-10T19:00:00+01:00 NaN GBP NaN 0.0 30.8 Standard
1 882166 Catherine Bohart: Work in Progress NaN [Comedy, Stand-up] 84275 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 NaN GBP NaN 0.0 0.0 Standard
1 882166 Catherine Bohart: Work in Progress NaN [Comedy, Stand-up] 84275 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 NaN GBP NaN 0.0 0.0 Standard
1 882166 Catherine Bohart: Work in Progress NaN [Comedy, Stand-up] 84275 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00 NaN GBP NaN 0.0 0.0 Standard

Experiment 6.1: Getting the Frequency of Price Tickets

We are working just with max_price.

In [65]:
g_maxp=df_tickets.groupby(['max_price']).size().reset_index()
g_maxp=g_maxp.rename(columns={0: "number_of_times"})
#g_maxp=g_maxp.sort_values(by=['number_of_times'], ascending=False)
free_tickets=g_maxp[0:1]
## Removing FREE TICKETS
g_maxp=g_maxp.drop([0])
### 
g_maxp[:]
Out[65]:
max_price number_of_times
1 3.00 1
2 4.00 2
3 6.00 3
4 7.00 7
5 7.09 1
... ... ...
127 145.00 1
128 148.50 1
129 165.00 2
130 375.00 1
131 400.00 1

131 rows × 2 columns

In [66]:
fig = px.line(g_maxp, x="max_price", y="number_of_times", title='Frequency of price tickets')
fig.show()
In [67]:
print("The number of free tickets is: %s" %free_tickets["number_of_times"].values[0])
The number of free tickets is: 33757

Experiment 6.2: Getting the frequency of type (Standard, Children) tickets

In [68]:
tickets_type=df_tickets.groupby(['type']).size().reset_index()
tickets_type=tickets_type.rename(columns={0: "number_of_times"}).sort_values(by=['number_of_times'], ascending=False)
tickets_type
Out[68]:
type number_of_times
12 Standard 21898
4 Concession 4387
2 Children 4277
5 Family 145
13 Students 113
10 Reservation Fee 60
8 Members 49
19 under 16s 31
0 Adult 18
7 Group Discounts 9
14 Taster class 8
11 Seniors 5
9 Pre-Theatre Package 1
1 Carers 1
6 General Admission 1
3 Children 1
15 Under 10s 1
16 Under 12s 1
17 Under 3s 1
18 adult 1
In [69]:
px.histogram(tickets_type, x="type", y="number_of_times", histfunc="sum", color="type", title='Frequency of type tickets')

6.3 Exploring Performances Places

In [70]:
df_tickets["place_id"]
Out[70]:
0        22978
0        22978
1        84275
1        84275
1        84275
         ...  
2963    131091
2963    131091
2963    131091
2963    131091
2963    131091
Name: place_id, Length: 34844, dtype: int64

Creating places dataframe

In [71]:
data="dataset/sample_20180501.json"
with open('dataset/sample_20180501.json', 'r') as f:
    data = json.load(f)
    print(len(data["places"]))
    places=data["places"]
df_places = pd.DataFrame(places)
1224
In [72]:
df_place = df_tickets.merge(df_places, on=['place_id','place_id'])
In [73]:
df_place.shape[0]
Out[73]:
28325

6.3.1 Frequency of Performances per Town

In [74]:
df_town=df_place.dropna(subset=['town'])
town=df_town.groupby(['town']).size().reset_index()
town=town.rename(columns={0: "number_of_times"})
town=town.drop([0])
In [75]:
town=town.sort_values(by=['number_of_times'], ascending=False)
town
Out[75]:
town number_of_times
22 Edinburgh 20571
31 Humbie 686
45 Melrose 602
10 Coldstream 492
6 Biggar 376
... ... ...
12 Cowdenbeath 2
17 Dirleton 1
30 Hawick 1
42 Livingston village 1
39 Leven 1

61 rows × 2 columns

In [76]:
px.scatter(town, x="town",y='number_of_times', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of Performances per Town")

6.3.2 Frequency of Type tickets per town

In [77]:
town_type=df_town.groupby(['town', 'type']).size().reset_index()
town_type=town_type.rename(columns={0: "number_of_times"})
town_type=town_type[town_type["town"]!=""]
In [78]:
town_type=town_type.sort_values(by=['number_of_times'], ascending=False)
town_type
Out[78]:
town type number_of_times
44 Edinburgh Standard 13705
39 Edinburgh Concession 3161
38 Edinburgh Children 787
97 Melrose Standard 387
66 Humbie Children 343
... ... ... ...
95 Melrose Concession 1
46 Edinburgh Under 10s 1
69 Innerleithen Standard 1
42 Edinburgh Pre-Theatre Package 1
68 Innerleithen Children 1

133 rows × 3 columns

In [79]:
fig = px.scatter(town_type, x='town', y='type', color='number_of_times', title="Frequency of type tickets per town")
fig.show()
In [80]:
px.scatter(town_type, x="town",y='type', color='number_of_times', size="number_of_times", size_max=60, title="Frequency of performances type tickets per town")

6.3.3. Frequency of Max_Price tickets per towns

In [81]:
a=df_town[["town", "max_price"]]
a=a[a["town"]!=""]
town_price=a.groupby(['town', 'max_price']).size().reset_index()
town_price=town_price.rename(columns={0: "number_of_times"})
town_price=town_price.sort_values(by=['number_of_times'], ascending=False)
town_price
Out[81]:
town max_price number_of_times
27 Edinburgh 0.00 19930
153 Humbie 0.00 686
171 Melrose 0.00 596
11 Coldstream 0.00 492
7 Biggar 0.00 376
... ... ... ...
109 Edinburgh 50.00 1
107 Edinburgh 48.50 1
103 Edinburgh 44.00 1
102 Edinburgh 43.00 1
100 Edinburgh 40.15 1

201 rows × 3 columns

6.3.3.1. Frequency of free tickets per town

In [82]:
free_town_price=town_price[town_price["max_price"]== 0.0]
free_town_price
Out[82]:
town max_price number_of_times
27 Edinburgh 0.0 19930
153 Humbie 0.0 686
171 Melrose 0.0 596
11 Coldstream 0.0 492
7 Biggar 0.0 376
198 Wilkieston 0.0 372
156 Kelso 0.0 370
8 Bonchester Bridge 0.0 368
155 Jedburgh 0.0 350
200 Wormit 0.0 320
191 St Andrews 0.0 318
187 Saline 0.0 314
185 Pitscottie 0.0 306
142 Galashiels 0.0 306
159 Kirkcaldy 0.0 292
184 Peeblesshire 0.0 283
158 Kincardine 0.0 244
26 Eddleston 0.0 200
175 Musselburgh 0.0 190
15 Crossford 0.0 174
20 Dunfermline 0.0 156
164 Livingston 0.0 143
16 Cupar 0.0 128
179 North Berwick 0.0 94
5 Berwick-upon-Tweed 0.0 71
168 Lochgelly 0.0 64
3 Bathgate 0.0 55
163 Linlithgow 0.0 44
180 Peebles 0.0 41
9 Burntisland 0.0 41
160 Kirkliston 0.0 36
148 Glenrothes 0.0 34
14 Crail 0.0 20
1 Aberlady 0.0 18
17 Dalkeith 0.0 18
190 South Queensferry 0.0 17
0 Aberdour 0.0 15
2 Anstruther 0.0 10
10 Cockenzie 0.0 6
149 Gorebridge 0.0 6
178 Newburgh 0.0 4
161 Leuchars 0.0 4
186 Prestonpans 0.0 4
24 Duns 0.0 2
169 Longniddry 0.0 2
141 Falkland 0.0 2
147 Gifford 0.0 2
154 Innerleithen 0.0 2
151 Gullane 0.0 2
13 Cowdenbeath 0.0 2
197 West Linton 0.0 2
192 St Boswells 0.0 2
146 Gattonside 0.0 2
12 Collessie 0.0 2
19 Dunbar 0.0 2
152 Hawick 0.0 1
18 Dirleton 0.0 1
167 Livingston village 0.0 1
162 Leven 0.0 1
In [83]:
fig = px.bar(free_town_price, x='town', y='number_of_times', color='number_of_times', barmode='group', title="Frequency of Free Tickets per Town")
fig.show()

6.3.3.1. Frequency of No free tickets per town

In [84]:
town_price=town_price[town_price["max_price"]!= 0.0]
town_price
Out[84]:
town max_price number_of_times
33 Edinburgh 7.99 184
41 Edinburgh 10.00 80
72 Edinburgh 20.00 73
189 Selkirk 36.00 40
165 Livingston 9.50 34
... ... ... ...
109 Edinburgh 50.00 1
107 Edinburgh 48.50 1
103 Edinburgh 44.00 1
102 Edinburgh 43.00 1
100 Edinburgh 40.15 1

142 rows × 3 columns

In [85]:
fig = px.bar(town_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Town")
fig.show()
In [86]:
town_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[86]:
max_price number_of_times
town
Edinburgh 4445.96 641
Peebles 115.00 3
Melrose 110.00 6
Wilkieston 70.00 1
Gorebridge 67.20 1
Galashiels 67.00 4
Longniddry 61.52 2
Dunfermline 60.50 9
St Boswells 55.00 3
Kelso 50.00 4
Selkirk 44.00 44
Livingston 25.50 35
Duns 20.00 2
Musselburgh 16.00 2
Berwick-upon-Tweed 14.00 1
Tranent 13.00 6
Bathgate 10.00 1

6.4 Selecting Scottish Cities: Edinburgh, Glasgow, Dundee, Perth, Inverness, Aberdeen, St Andrews

6.4.1 Frequency of Price Tickets per Scottish City

In [87]:
scot_towns_price=town_price[town_price['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [88]:
scot_towns_price[0:10]
Out[88]:
town max_price number_of_times
33 Edinburgh 7.99 184
41 Edinburgh 10.00 80
72 Edinburgh 20.00 73
60 Edinburgh 15.00 24
74 Edinburgh 21.50 22
51 Edinburgh 13.00 18
87 Edinburgh 32.00 18
55 Edinburgh 13.50 16
47 Edinburgh 12.00 9
46 Edinburgh 11.29 8
In [89]:
fig = px.bar(scot_towns_price, x='town', y='max_price', color='number_of_times', barmode='group', title="Frequency of Price Tickets per Scottish City")
fig.show()
In [90]:
scot_towns_price.groupby(["town"]).sum().sort_values(by=['max_price'], ascending=False)
Out[90]:
max_price number_of_times
town
Edinburgh 4445.96 641

6.4.2 Frequency of Type Tickets per Scottish City

In [91]:
scot_towns_type=town_type[town_type['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
In [92]:
scot_towns_type[0:10]
Out[92]:
town type number_of_times
44 Edinburgh Standard 13705
39 Edinburgh Concession 3161
38 Edinburgh Children 787
124 St Andrews Standard 162
122 St Andrews Children 95
41 Edinburgh Members 48
40 Edinburgh Family 34
45 Edinburgh Students 11
43 Edinburgh Seniors 5
123 St Andrews Concession 3
In [93]:
fig = px.bar(scot_towns_type, x='town', y='number_of_times', color='type', barmode='group', title="Frequency of Type Tickets per Scottish City")
fig.show()
In [94]:
scot_towns_type.groupby(["town"]).sum()
Out[94]:
number_of_times
town
Edinburgh 17755
St Andrews 260
In [95]:
df_place.loc[0]
Out[95]:
event_id                                                     401143
event_name                    Al Murray: Landlord of Hope and Glory
descriptions_x                                                  NaN
event_tags                                       [Comedy, Stand-up]
place_id                                                      22978
start_ts                                  2021-06-05T19:30:00+01:00
end_ts                                    2021-09-10T19:00:00+01:00
0                                                               NaN
currency                                                        GBP
description                                                     NaN
max_price                                                       0.0
min_price                                                      30.8
type                                                       Standard
address                                           35 Canmore Street
email                                  info@alhambradunfermline.com
postal_code                                                KY12 7NX
properties                           {'place.capacity.max': '2100'}
sort_name                                          Alhambra Theatre
town                                                    Dunfermline
website                         http://www.alhambradunfermline.com/
modified_ts                                    2020-04-17T10:12:52Z
created_ts                                     2020-04-17T10:12:52Z
name                                               Alhambra Theatre
loc               {'latitude': '56.07007575033162', 'longitude':...
country_code                                                     GB
tags                  [Concert Hall, Music venue, Theatres, Venues]
descriptions_y    [{'type': 'description.list.default', 'descrip...
phone_numbers                        {'box_office': '01383 740384'}
status                                                         live
Name: 0, dtype: object

6.4.3.3 Frequency of Schedules Dates per Event and per Scottish City

In [96]:
df_place2=df_place.dropna(subset=['town'])
df_place2
df_scott=df_place2[df_place2['town'].isin(["Edinburgh", "Glasgow", "Perth", "Inverness", "Dundee", "St Andrews", "Aberdeen"])]
df_scott=df_scott[["event_id", "event_name", "event_tags", "town", "start_ts", "end_ts"]]
df_scott[0:3]
Out[96]:
event_id event_name event_tags town start_ts end_ts
13 882166 Catherine Bohart: Work in Progress [Comedy, Stand-up] Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
14 882166 Catherine Bohart: Work in Progress [Comedy, Stand-up] Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
15 882166 Catherine Bohart: Work in Progress [Comedy, Stand-up] Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00

Note: An event can have several schedules. And a schedule has an starting and end date. Therefore, an event can have several starting and end dates.

In [97]:
fig = px.scatter(df_scott, x='start_ts', y="event_name", title="Frequency of starting date per event in Scottish cities")
fig.show()
In [98]:
fig = px.scatter(df_scott, x='end_ts', y="event_name", title="Frequency of ending date per event in Scottish cities")
fig.show()

6.4.4 Grouping Schedules per Event and Scottish City

In [99]:
scott_schedule=df_scott.groupby(['event_name', 'town']).size().reset_index()
scott_schedule=scott_schedule.rename(columns={0: "number_of_times"})
scott_schedule=scott_schedule.sort_values(by=['number_of_times'], ascending=False)
scott_schedule
Out[99]:
event_name town number_of_times
1081 Mercat Tours: Ghostly Underground Edinburgh 990
145 Archie Brennan: Tapestry Goes Pop! Edinburgh 460
1084 Mercat Tours: Historic Underground Edinburgh 396
1282 Phantasmaphone Edinburgh 360
318 Charity Garden Opening - 101 Greenbank Crescent Edinburgh 264
... ... ... ...
1002 Louise Dodds Quartet Edinburgh 1
1003 Louise Dodds Quintet Edinburgh 1
1008 Loyiso Gola: Pop Culture Edinburgh 1
1009 Luca Stricagnoli Edinburgh 1
2046 ‘The Divell burn up the kirk!’ The language of... St Andrews 1

2047 rows × 3 columns

In [100]:
t=scott_schedule.groupby(["event_name"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[100]:
number_of_times
event_name
Mercat Tours: Ghostly Underground 990
Archie Brennan: Tapestry Goes Pop! 460
Mercat Tours: Historic Underground 396
Phantasmaphone 360
Mercat Tours: Doomed, Dead & Buried 18+ 264
... ...
Oktoberfest Closing Party: Back Chat Brass 1
Bourbon Saturday Relaunch Party 1
Bound In Sound presents John Paynter 1
Old Time Sailors 1
‘The Divell burn up the kirk!’ The language of religious protest in sixteenth-century Fife 1

2041 rows × 1 columns

In [101]:
fig = px.bar(t, title="Frequency of Schedules per event")
fig.show()

6.4.5 Exploring Tags per Schedule and Scottish Cities.

In [102]:
a=df_scott.reset_index(drop=True)
tags_town=a[["event_tags", "town"]]
tags_town=tags_town.explode("event_tags")
tags_town
Out[102]:
event_tags town
0 Comedy Edinburgh
0 Stand-up Edinburgh
1 Comedy Edinburgh
1 Stand-up Edinburgh
2 Comedy Edinburgh
... ... ...
20884 Music Edinburgh
20885 Music Edinburgh
20886 Music Edinburgh
20887 Music Edinburgh
20888 Music Edinburgh

60624 rows × 2 columns

In [103]:
scott_tag=tags_town.groupby(['town', 'event_tags']).size().reset_index()
scott_tag=scott_tag.rename(columns={0: "number_of_times"})
scott_tag=scott_tag.sort_values(by=['number_of_times'], ascending=False)
scott_tag
Out[103]:
town event_tags number_of_times
94 Edinburgh Days out 4594
75 Edinburgh Comedy 4529
377 Edinburgh Visual art 4041
174 Edinburgh History 3377
352 Edinburgh Theatre 3224
... ... ... ...
223 Edinburgh Motherwell 1
227 Edinburgh Musical comedy 1
231 Edinburgh Nirvana 1
239 Edinburgh Participation 1
431 St Andrews World 1

432 rows × 3 columns

In [104]:
fig=px.histogram(scott_tag, x="town", y="number_of_times", histfunc="sum", color="event_tags", title='Frequency of tags in Scottish Cities')
fig.update_layout(legend_traceorder="reversed")
fig.show()
In [105]:
t=scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
t
Out[105]:
number_of_times
event_tags
Days out 4832
Comedy 4533
Visual art 4041
History 3403
Theatre 3258
... ...
Fun runs 1
Scarlets 1
Funky House 1
Rugby 1
Laura Veirs 1

398 rows × 1 columns

6.4.5.1 Exploring the Frequency of schedules tags for Edinburgh

In [106]:
edi_scott_tag=scott_tag[scott_tag['town'].isin(["Edinburgh"])]
edi_scott_tag
Out[106]:
town event_tags number_of_times
94 Edinburgh Days out 4594
75 Edinburgh Comedy 4529
377 Edinburgh Visual art 4041
174 Edinburgh History 3377
352 Edinburgh Theatre 3224
... ... ... ...
222 Edinburgh Mortal Kombat 1
223 Edinburgh Motherwell 1
227 Edinburgh Musical comedy 1
231 Edinburgh Nirvana 1
239 Edinburgh Participation 1

395 rows × 3 columns

In [107]:
edi_scott_tag.groupby(["event_tags"]).sum().sort_values(by=['number_of_times'], ascending=False)
Out[107]:
number_of_times
event_tags
Days out 4594
Comedy 4529
Visual art 4041
History 3377
Theatre 3224
... ...
Deep House 1
Spectator 1
Dog Walk 1
Dog show & trials 1
Dundee 1

395 rows × 1 columns

In [108]:
fig = px.bar(edi_scott_tag, x='town', y='number_of_times', color='event_tags', barmode='group', title="Frequency of schedules tags for Edinburgh")
fig.show()

6.4.6 Histograms of starting/end schedules dates for Edinburgh

In [109]:
scott_start=df_scott.groupby([pd.to_datetime(df_scott['start_ts']), "town"]).size().reset_index()
scott_start=scott_start.rename(columns={0: "number_of_times"})
scott_start=scott_start.sort_values(by=['number_of_times'], ascending=False)
scott_start.reset_index()
Out[109]:
index start_ts town number_of_times
0 219 2021-07-01 13:15:00+01:00 Edinburgh 990
1 452 2021-08-06 10:00:00+01:00 Edinburgh 677
2 3 2021-05-01 10:00:00+01:00 Edinburgh 650
3 221 2021-07-01 16:00:00+01:00 Edinburgh 396
4 441 2021-08-05 18:00:00+01:00 Edinburgh 354
... ... ... ... ...
1714 1200 2021-09-05 18:30:00+01:00 Edinburgh 1
1715 1202 2021-09-05 21:00:00+01:00 Edinburgh 1
1716 1203 2021-09-06 19:00:00+01:00 Edinburgh 1
1717 1204 2021-09-06 19:30:00+01:00 Edinburgh 1
1718 1718 2021-10-31 23:15:00+00:00 Edinburgh 1

1719 rows × 4 columns

In [110]:
ed_scott_start=scott_start[scott_start['town'].isin(["Edinburgh"])].reset_index()
ed_scott_start.groupby(["start_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_start, x='town', y='number_of_times', color='start_ts', barmode='group', title="Frequency of starting date schedules for Edinburgh")
#fig.show()
Out[110]:
index number_of_times
start_ts
2021-07-01 13:15:00+01:00 219 990
2021-08-06 10:00:00+01:00 452 677
2021-05-01 10:00:00+01:00 3 650
2021-07-01 16:00:00+01:00 221 396
2021-08-05 18:00:00+01:00 441 354
... ... ...
2021-07-22 14:00:00+01:00 320 1
2021-05-26 19:15:00+01:00 85 1
2021-07-21 20:00:00+01:00 318 1
2021-06-22 19:45:00+01:00 180 1
2021-10-31 23:15:00+00:00 1718 1

1653 rows × 2 columns

In [111]:
scott_end=df_scott.groupby([pd.to_datetime(df_scott['end_ts']), "town"]).size().reset_index()
scott_end=scott_end.rename(columns={0: "number_of_times"})
scott_end=scott_end.sort_values(by=['number_of_times'], ascending=False)
scott_end.reset_index()
ed_scott_end=scott_end[scott_end['town'].isin(["Edinburgh"])].reset_index()
ed_scott_end.groupby(["end_ts"]).sum().sort_values(by=['number_of_times'], ascending=False)
#fig = px.bar(ed_scott_end, x='town', y='number_of_times', color='end_ts', barmode='group', title="Frequency of ending date schedules for Edinburgh")
#fig.show()
Out[111]:
index number_of_times
end_ts
2021-10-31 10:00:00+00:00 1558 1030
2021-10-31 19:00:00+00:00 1578 1002
2021-08-30 16:00:00+01:00 1005 432
2021-10-31 16:00:00+00:00 1570 397
2021-08-15 18:40:00+01:00 527 380
... ... ...
2021-09-26 21:10:00+01:00 1244 1
2021-09-26 20:30:00+01:00 1243 1
2021-08-06 19:00:00+01:00 372 1
2021-09-26 19:00:00+01:00 1240 1
2021-10-31 23:15:00+00:00 1591 1

1526 rows × 2 columns

In [112]:
fig = px.histogram(ed_scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Edinburgh")
fig.show()
In [113]:
fig = px.histogram(scott_start, x='start_ts', y="number_of_times", title="Histogram of Schedules Starting Dates for Scottish Cities")
fig.show()
In [114]:
fig = px.histogram(scott_end, x='end_ts', y="number_of_times", title="Histogram of Schedules Ending Dates for Scottish Cities")
fig.show()
In [115]:
fig = px.histogram(scott_end, x="end_ts", y="number_of_times", histfunc="sum", title="Histogram on Date Axes")
fig.update_traces(xbins_size="M1")
fig.update_xaxes(showgrid=True, ticklabelmode="period", dtick="M1", tickformat="%b\n%Y")
fig.update_layout(bargap=0.1)
fig.add_trace(go.Scatter(mode="markers", x=scott_end["end_ts"], y=scott_end["number_of_times"], name="daily"))
fig.show()

6.4.7 Working with Schedule tags, Scottish cities, Starting/End Time

In [116]:
b=df_scott.reset_index(drop=True)
tag_town_time=b[["event_tags", "town", "start_ts", "end_ts"]]
tag_town_time=tag_town_time.explode("event_tags")
tag_town_time
Out[116]:
event_tags town start_ts end_ts
0 Comedy Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
0 Stand-up Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
1 Comedy Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
1 Stand-up Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
2 Comedy Edinburgh 2021-08-04T18:45:00+01:00 2021-08-07T17:30:00+01:00
... ... ... ... ...
20884 Music Edinburgh 2021-08-27T15:00:00+01:00 2021-08-29T15:00:00+01:00
20885 Music Edinburgh 2021-08-27T15:00:00+01:00 2021-08-29T15:00:00+01:00
20886 Music Edinburgh 2021-08-27T15:00:00+01:00 2021-08-29T15:00:00+01:00
20887 Music Edinburgh 2021-08-27T15:00:00+01:00 2021-08-29T15:00:00+01:00
20888 Music Edinburgh 2021-08-27T15:00:00+01:00 2021-08-29T15:00:00+01:00

60624 rows × 4 columns

In [117]:
scott_tag_end=tag_town_time.groupby([pd.to_datetime(tag_town_time['end_ts']), "event_tags"]).size().reset_index()
scott_tag_end=scott_tag_end.rename(columns={0: "number_of_times"})
scott_tag_end=scott_tag_end.sort_values(by=['number_of_times'], ascending=False)


scott_tag_start=tag_town_time.groupby([pd.to_datetime(tag_town_time['start_ts']), "event_tags"]).size().reset_index()
scott_tag_start=scott_tag_start.rename(columns={0: "number_of_times"})
scott_tag_start=scott_tag_start.sort_values(by=['number_of_times'], ascending=False)
In [118]:
scott_tag_start
Out[118]:
start_ts event_tags number_of_times
724 2021-07-01 13:15:00+01:00 Ghost Tour 990
725 2021-07-01 13:15:00+01:00 History 990
726 2021-07-01 13:15:00+01:00 Kids 990
723 2021-07-01 13:15:00+01:00 Days out 990
727 2021-07-01 13:15:00+01:00 Tours 990
... ... ... ...
3410 2021-09-06 19:00:00+01:00 Music 1
3411 2021-09-06 19:00:00+01:00 Rock & Pop 1
3412 2021-09-06 19:30:00+01:00 Alternative 1
3413 2021-09-06 19:30:00+01:00 Country 1
5478 2021-10-31 23:15:00+00:00 Horror 1

5479 rows × 3 columns

6.4.7.1 Frequency of schedules Starting Date in Scottish City

In [119]:
#fig = px.bar(scott_tag_start, x='event_tags', y='start_ts', color='number_of_times', barmode='group', title="Frequency of schedules tags per Scottish City")
#fig.show()

fig = px.scatter(scott_tag_start, x='start_ts', y='number_of_times', title="Frequency of schedules Starting Date in Scottish City.")
fig.show()

6.4.7.2 Frequency of schedules Ending Date in Scottish City

In [120]:
fig = px.scatter(scott_tag_end, x='end_ts', y='number_of_times', title="Frequency of schedules Ending Date in Scottish City.")
fig.show()

6.4.7.3 Scheduled tags and Starting Dates in Scottish City

In [121]:
fig = px.scatter(scott_tag_start, x='start_ts', y='event_tags', title="Scheduled Tags and Starting Dates in Scottish City.")
fig.show()

6.4.7.3 Scheduled Tags and Ending Dates in Scottish City

In [122]:
fig = px.scatter(scott_tag_end, x='end_ts', y='event_tags', title="Scheduled Tags and Ending Dates in Scottish City.")
fig.show()
In [ ]: